AWS API Gateway Terraform 深度实战
API Gateway 是 AWS 无服务器架构的核心组件。用控制台点点点能跑起来,但到了生产环境,Infrastructure as Code 才是正道。这篇文章从基础到高级,系统讲解如何用 Terraform 管理 API Gateway,踩坑经验全部包含。
一、REST API vs HTTP API:先选对类型
AWS 有两种 API Gateway,选错了后面全白搭:
- REST API (v1):功能全,支持 WAF、请求验证、Usage Plan、API Key、请求/响应转换。适合对外开放的正式 API。
- HTTP API (v2):便宜 70%,延迟低,支持 JWT 原生授权、自动部署。适合内部微服务通信或简单代理。
一句话:对外用 REST API,对内用 HTTP API。下面两种都会讲。
二、REST API 完整 Terraform 配置
先来一个生产级的 REST API,集成 Lambda 后端:
# ============================================
# REST API 主体
# ============================================
resource "aws_api_gateway_rest_api" "main" {
name = "${var.project}-api"
description = "Production REST API"
endpoint_configuration {
types = ["REGIONAL"] # REGIONAL / EDGE / PRIVATE
}
# 重要:控制 body 变更时的行为
put_rest_api_mode = "overwrite"
tags = var.common_tags
}
# ============================================
# 资源路径: /users/{userId}
# ============================================
resource "aws_api_gateway_resource" "users" {
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_rest_api.main.root_resource_id
path_part = "users"
}
resource "aws_api_gateway_resource" "user_by_id" {
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_resource.users.id
path_part = "{userId}"
}
# ============================================
# GET /users/{userId}
# ============================================
resource "aws_api_gateway_method" "get_user" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.user_by_id.id
http_method = "GET"
authorization = "COGNITO_USER_POOLS"
authorizer_id = aws_api_gateway_authorizer.cognito.id
request_parameters = {
"method.request.path.userId" = true
"method.request.header.Authorization" = true
}
# 请求验证器
request_validator_id = aws_api_gateway_request_validator.params.id
}
# Lambda 集成
resource "aws_api_gateway_integration" "get_user" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.user_by_id.id
http_method = aws_api_gateway_method.get_user.http_method
integration_http_method = "POST" # Lambda 永远用 POST
type = "AWS_PROXY"
uri = aws_lambda_function.get_user.invoke_arn
# 超时设置(最大 29 秒)
timeout_milliseconds = 29000
}
# Lambda 权限
resource "aws_lambda_permission" "apigw_get_user" {
statement_id = "AllowAPIGatewayInvoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.get_user.function_name
principal = "apigateway.amazonaws.com"
source_arn = "${aws_api_gateway_rest_api.main.execution_arn}/*/*"
}
三、请求验证 — 把垃圾请求挡在 Lambda 之前
很多人忽略这个功能,让无效请求打到 Lambda 白白花钱。Terraform 配置:
# 请求验证器
resource "aws_api_gateway_request_validator" "params" {
name = "validate-params"
rest_api_id = aws_api_gateway_rest_api.main.id
validate_request_parameters = true
validate_request_body = true
}
# 请求 Model(JSON Schema 验证请求体)
resource "aws_api_gateway_model" "create_user" {
rest_api_id = aws_api_gateway_rest_api.main.id
name = "CreateUserRequest"
content_type = "application/json"
schema = jsonencode({
"$schema" = "http://json-schema.org/draft-04/schema#"
type = "object"
required = ["name", "email"]
properties = {
name = {
type = "string"
minLength = 1
maxLength = 100
}
email = {
type = "string"
pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
}
age = {
type = "integer"
minimum = 0
maximum = 150
}
}
})
}
# POST /users 使用 Model 验证
resource "aws_api_gateway_method" "create_user" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.users.id
http_method = "POST"
authorization = "COGNITO_USER_POOLS"
authorizer_id = aws_api_gateway_authorizer.cognito.id
request_validator_id = aws_api_gateway_request_validator.params.id
request_models = {
"application/json" = aws_api_gateway_model.create_user.name
}
}
这样,name 为空、email 格式不对的请求,直接被 API Gateway 返回 400,Lambda 根本不会被调用。
四、Cognito 授权器
resource "aws_api_gateway_authorizer" "cognito" {
name = "cognito-authorizer"
rest_api_id = aws_api_gateway_rest_api.main.id
type = "COGNITO_USER_POOLS"
identity_source = "method.request.header.Authorization"
provider_arns = [aws_cognito_user_pool.main.arn]
# Token 缓存时间(秒),减少 Cognito 调用
authorizer_result_ttl_in_seconds = 300
}
# 如果需要更灵活的授权逻辑,用 Lambda Authorizer
resource "aws_api_gateway_authorizer" "lambda_auth" {
name = "lambda-authorizer"
rest_api_id = aws_api_gateway_rest_api.main.id
type = "TOKEN"
authorizer_uri = aws_lambda_function.authorizer.invoke_arn
authorizer_credentials = aws_iam_role.apigw_auth_invocation.arn
identity_source = "method.request.header.Authorization"
# 缓存策略:相同 token 5分钟内不重复调用 Lambda
authorizer_result_ttl_in_seconds = 300
}
五、部署与阶段管理 — 最容易踩坑的地方
API Gateway 的部署模型是很多人困惑的地方。改了配置不生效?多半是部署没触发。
# 部署 — 关键是 triggers
resource "aws_api_gateway_deployment" "main" {
rest_api_id = aws_api_gateway_rest_api.main.id
# 核心技巧:用所有相关资源的变化来触发重新部署
triggers = {
redeployment = sha1(jsonencode([
aws_api_gateway_resource.users.id,
aws_api_gateway_resource.user_by_id.id,
aws_api_gateway_method.get_user.id,
aws_api_gateway_method.create_user.id,
aws_api_gateway_integration.get_user.id,
aws_api_gateway_integration.create_user.id,
]))
}
lifecycle {
create_before_destroy = true
}
}
# Stage
resource "aws_api_gateway_stage" "prod" {
deployment_id = aws_api_gateway_deployment.main.id
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = "prod"
# 访问日志
access_log_settings {
destination_arn = aws_cloudwatch_log_group.apigw.arn
format = jsonencode({
requestId = "$context.requestId"
ip = "$context.identity.sourceIp"
caller = "$context.identity.caller"
user = "$context.identity.user"
requestTime = "$context.requestTime"
httpMethod = "$context.httpMethod"
resourcePath = "$context.resourcePath"
status = "$context.status"
protocol = "$context.protocol"
responseLength = "$context.responseLength"
errorMessage = "$context.error.message"
integrationLatency = "$context.integration.latency"
})
}
# Stage 变量(可在集成中引用)
variables = {
env = "prod"
lambda_alias = "live"
}
tags = var.common_tags
}
# 方法级别的设置(限流、缓存)
resource "aws_api_gateway_method_settings" "prod" {
rest_api_id = aws_api_gateway_rest_api.main.id
stage_name = aws_api_gateway_stage.prod.stage_name
method_path = "*/*"
settings {
# 限流
throttling_burst_limit = 500
throttling_rate_limit = 1000
# 日志级别
logging_level = "INFO"
metrics_enabled = true
# 缓存(按需开启,有额外费用)
caching_enabled = false
cache_ttl_in_seconds = 300
}
}
踩坑提醒:如果你改了 method 或 integration 但没更新 triggers 里的引用,deployment 不会重新创建,改动就不会生效。这是 Terraform 管理 API Gateway 最常见的坑。
六、自定义域名 + Route53
# ACM 证书(必须在 us-east-1,如果用 EDGE 类型)
resource "aws_acm_certificate" "api" {
provider = aws.us_east_1 # EDGE 类型需要
domain_name = "api.example.com"
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
}
# 自定义域名
resource "aws_api_gateway_domain_name" "api" {
domain_name = "api.example.com"
regional_certificate_arn = aws_acm_certificate.api.arn
endpoint_configuration {
types = ["REGIONAL"]
}
}
# 路径映射
resource "aws_api_gateway_base_path_mapping" "api" {
api_id = aws_api_gateway_rest_api.main.id
stage_name = aws_api_gateway_stage.prod.stage_name
domain_name = aws_api_gateway_domain_name.api.domain_name
base_path = "" # 空字符串 = 根路径
}
# Route53 记录
resource "aws_route53_record" "api" {
zone_id = data.aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
alias {
name = aws_api_gateway_domain_name.api.regional_domain_name
zone_id = aws_api_gateway_domain_name.api.regional_zone_id
evaluate_target_health = true
}
}
七、WAF 集成 — 生产环境必备
resource "aws_wafv2_web_acl" "api" {
name = "${var.project}-api-waf"
scope = "REGIONAL"
default_action {
allow {}
}
# 速率限制:同一 IP 5分钟内最多 2000 次请求
rule {
name = "rate-limit"
priority = 1
action {
block {}
}
statement {
rate_based_statement {
limit = 2000
aggregate_key_type = "IP"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "RateLimit"
}
}
# AWS 托管规则 — 防 SQL 注入
rule {
name = "aws-managed-sql"
priority = 2
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesSQLiRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "SQLInjection"
}
}
# AWS 托管规则 — 防常见攻击
rule {
name = "aws-managed-common"
priority = 3
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
# 排除误杀的规则
excluded_rule {
name = "SizeRestrictions_BODY"
}
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "CommonRules"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "APIGatewayWAF"
}
}
# 关联 WAF 到 API Gateway Stage
resource "aws_wafv2_web_acl_association" "api" {
resource_arn = aws_api_gateway_stage.prod.arn
web_acl_arn = aws_wafv2_web_acl.api.arn
}
八、HTTP API (v2) — 轻量级方案
如果不需要 WAF、请求验证这些重型功能,HTTP API 更简洁也更便宜:
resource "aws_apigatewayv2_api" "http" {
name = "${var.project}-http-api"
protocol_type = "HTTP"
cors_configuration {
allow_origins = ["https://www.example.com"]
allow_methods = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allow_headers = ["Content-Type", "Authorization"]
max_age = 3600
}
}
# JWT 授权器(HTTP API 原生支持,不需要 Lambda)
resource "aws_apigatewayv2_authorizer" "jwt" {
api_id = aws_apigatewayv2_api.http.id
authorizer_type = "JWT"
identity_sources = ["$request.header.Authorization"]
name = "jwt-authorizer"
jwt_configuration {
audience = [aws_cognito_user_pool_client.main.id]
issuer = "https://cognito-idp.${var.region}.amazonaws.com/${aws_cognito_user_pool.main.id}"
}
}
# Lambda 集成
resource "aws_apigatewayv2_integration" "lambda" {
api_id = aws_apigatewayv2_api.http.id
integration_type = "AWS_PROXY"
integration_uri = aws_lambda_function.handler.invoke_arn
payload_format_version = "2.0" # 用 2.0 格式,更简洁
}
# 路由
resource "aws_apigatewayv2_route" "get_users" {
api_id = aws_apigatewayv2_api.http.id
route_key = "GET /users/{userId}"
target = "integrations/${aws_apigatewayv2_integration.lambda.id}"
authorization_type = "JWT"
authorizer_id = aws_apigatewayv2_authorizer.jwt.id
}
# 自动部署的 Stage
resource "aws_apigatewayv2_stage" "prod" {
api_id = aws_apigatewayv2_api.http.id
name = "prod"
auto_deploy = true # 路由变更自动部署,不用手动管理 deployment
access_log_settings {
destination_arn = aws_cloudwatch_log_group.http_api.arn
format = jsonencode({
requestId = "$context.requestId"
ip = "$context.identity.sourceIp"
method = "$context.httpMethod"
path = "$context.path"
status = "$context.status"
latency = "$context.responseLatency"
})
}
default_route_settings {
throttling_burst_limit = 500
throttling_rate_limit = 1000
}
}
九、Usage Plan + API Key — 对外开放 API 的计量
如果你的 API 要给第三方用,需要控制调用量和计费:
resource "aws_api_gateway_usage_plan" "partner" {
name = "partner-plan"
api_stages {
api_id = aws_api_gateway_rest_api.main.id
stage = aws_api_gateway_stage.prod.stage_name
}
# 限流
throttle_settings {
burst_limit = 100
rate_limit = 50
}
# 配额
quota_settings {
limit = 10000
period = "MONTH"
}
}
# API Key
resource "aws_api_gateway_api_key" "partner_a" {
name = "partner-a-key"
enabled = true
}
# 关联
resource "aws_api_gateway_usage_plan_key" "partner_a" {
key_id = aws_api_gateway_api_key.partner_a.id
key_type = "API_KEY"
usage_plan_id = aws_api_gateway_usage_plan.partner.id
}
# 方法上启用 API Key 要求
resource "aws_api_gateway_method" "get_data" {
rest_api_id = aws_api_gateway_rest_api.main.id
resource_id = aws_api_gateway_resource.data.id
http_method = "GET"
authorization = "NONE"
api_key_required = true # 关键:要求携带 x-api-key
}
十、模块化 — 大型项目的组织方式
当 API 路由超过 20 个,单文件就不可维护了。推荐这样组织:
terraform/
├── modules/
│ └── api-route/
│ ├── main.tf # resource + method + integration
│ ├── variables.tf
│ └── outputs.tf
├── api-gateway.tf # REST API 主体、部署、Stage
├── api-routes.tf # 调用 module 定义所有路由
├── api-authorizer.tf # 授权器
├── api-domain.tf # 自定义域名
├── api-waf.tf # WAF 规则
└── variables.tf
路由模块:
# modules/api-route/main.tf
variable "rest_api_id" {}
variable "parent_id" {}
variable "path_part" {}
variable "http_method" {}
variable "lambda_invoke_arn" {}
variable "authorizer_id" { default = null }
variable "authorization" { default = "NONE" }
resource "aws_api_gateway_resource" "this" {
rest_api_id = var.rest_api_id
parent_id = var.parent_id
path_part = var.path_part
}
resource "aws_api_gateway_method" "this" {
rest_api_id = var.rest_api_id
resource_id = aws_api_gateway_resource.this.id
http_method = var.http_method
authorization = var.authorization
authorizer_id = var.authorizer_id
}
resource "aws_api_gateway_integration" "this" {
rest_api_id = var.rest_api_id
resource_id = aws_api_gateway_resource.this.id
http_method = aws_api_gateway_method.this.http_method
integration_http_method = "POST"
type = "AWS_PROXY"
uri = var.lambda_invoke_arn
}
output "resource_id" {
value = aws_api_gateway_resource.this.id
}
output "method_id" {
value = aws_api_gateway_method.this.id
}
output "integration_id" {
value = aws_api_gateway_integration.this.id
}
调用方式:
# api-routes.tf
module "route_get_users" {
source = "./modules/api-route"
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_rest_api.main.root_resource_id
path_part = "users"
http_method = "GET"
lambda_invoke_arn = module.lambda_get_users.invoke_arn
authorization = "COGNITO_USER_POOLS"
authorizer_id = aws_api_gateway_authorizer.cognito.id
}
module "route_create_order" {
source = "./modules/api-route"
rest_api_id = aws_api_gateway_rest_api.main.id
parent_id = aws_api_gateway_rest_api.main.root_resource_id
path_part = "orders"
http_method = "POST"
lambda_invoke_arn = module.lambda_create_order.invoke_arn
authorization = "COGNITO_USER_POOLS"
authorizer_id = aws_api_gateway_authorizer.cognito.id
}
十一、常见踩坑总结
- deployment 不更新:triggers 里没包含变更的资源,导致 Terraform 认为不需要重新部署。把所有 method、integration 的 id 都放进 triggers。
- Lambda 权限 403:忘了加 aws_lambda_permission,API Gateway 没权限调用 Lambda。
- CORS 不生效:REST API 需要手动配置 OPTIONS 方法和 Mock 集成返回 CORS 头;HTTP API 用 cors_configuration 就行。
- Stage 变量引用:在集成 URI 中用 ${stageVariables.xxx} 引用,但 Terraform 会把它当变量插值。解决:用 $${stageVariables.xxx} 转义。
- Binary Media Types:REST API 默认不支持二进制。需要在 aws_api_gateway_rest_api 里设置 binary_media_types = ["*/*"],同时 Lambda 返回 isBase64Encoded: true。
- 29秒超时限制:API Gateway 最大超时 29 秒,无法修改。长任务请用异步模式:API Gateway → Lambda (异步调用) → 客户端轮询结果。
- Payload 大小限制:REST API 请求体最大 10MB,响应体最大 10MB。超过的考虑用 S3 presigned URL。
- terraform destroy 顺序问题:WAF association 必须在 Stage 之前销毁,否则会卡住。用 depends_on 显式声明依赖。
十二、监控告警配置
# CloudWatch 日志组
resource "aws_cloudwatch_log_group" "apigw" {
name = "/aws/apigateway/${var.project}"
retention_in_days = 30
}
# 5xx 错误告警
resource "aws_cloudwatch_metric_alarm" "api_5xx" {
alarm_name = "${var.project}-api-5xx"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "5XXError"
namespace = "AWS/ApiGateway"
period = 300
statistic = "Sum"
threshold = 10
alarm_description = "API Gateway 5xx errors exceeded threshold"
dimensions = {
ApiName = aws_api_gateway_rest_api.main.name
Stage = aws_api_gateway_stage.prod.stage_name
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
# 延迟告警
resource "aws_cloudwatch_metric_alarm" "api_latency" {
alarm_name = "${var.project}-api-latency"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
metric_name = "Latency"
namespace = "AWS/ApiGateway"
period = 300
extended_statistic = "p99"
threshold = 5000 # 5秒
alarm_description = "API Gateway p99 latency exceeded 5s"
dimensions = {
ApiName = aws_api_gateway_rest_api.main.name
Stage = aws_api_gateway_stage.prod.stage_name
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
以上就是用 Terraform 管理 AWS API Gateway 的完整实战。从选型到部署、从安全到监控,覆盖了生产环境需要考虑的各个方面。核心原则:REST API 功能全但配置繁琐,HTTP API 简洁但功能有限;deployment 的 triggers 一定要覆盖全;WAF 和监控是生产环境的标配。
留言板
留言提交后需管理员审核通过才会显示